organic molecule
Life on Earth may have come from cosmic dust
Amino acids may not have arrived on big space rocks after all. Breakthroughs, discoveries, and DIY tips sent every weekday. The scientific community is largely divided into two camps regarding the origins of life on Earth. On one side, the idea that life spontaneously stemmed from the planet's primordial soup of amino acids and organic molecules . On the other side, life arrived after hitching a ride on interstellar debris .
QMe14S, A Comprehensive and Efficient Spectral Dataset for Small Organic Molecules
Yuan, Mingzhi, Zou, Zihan, Hu, Wei
Developing machine learning protocols for molecular simulations requires comprehensive and efficient datasets. Here we introduce the QMe14S dataset, comprising 186,102 small organic molecules featuring 14 elements (H, B, C, N, O, F, Al, Si, P, S, Cl, As, Se, Br) and 47 functional groups. Using density functional theory at the B3LYP/TZVP level, we optimized the geometries and calculated properties including energy, atomic charge, atomic force, dipole moment, quadrupole moment, polarizability, octupole moment, first hyperpolarizability, and Hessian. At the same level, we obtained the harmonic IR, Raman and NMR spectra. Furthermore, we conducted ab initio molecular dynamics simulations to generate dynamic configurations and extract nonequilibrium properties, including energy, forces, and Hessians. By leveraging our E(3)-equivariant message-passing neural network (DetaNet), we demonstrated that models trained on QMe14S outperform those trained on the previously developed QM9S dataset in simulating molecular spectra. The QMe14S dataset thus serves as a comprehensive benchmark for molecular simulations, offering valuable insights into structure-property relationships.
- North America > United States > Connecticut > New Haven County > Wallingford (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
OpenQDC: Open Quantum Data Commons
Gabellini, Cristian, Shenoy, Nikhil, Thaler, Stephan, Canturk, Semih, McNeela, Daniel, Beaini, Dominique, Bronstein, Michael, Tossou, Prudencio
Machine Learning Interatomic Potentials (MLIPs) are a highly promising alternative to force-fields for molecular dynamics (MD) simulations, offering precise and rapid energy and force calculations. However, Quantum-Mechanical (QM) datasets, crucial for MLIPs, are fragmented across various repositories, hindering accessibility and model development. We introduce the openQDC package, consolidating 37 QM datasets from over 250 quantum methods and 400 million geometries into a single, accessible resource. These datasets are meticulously preprocessed, and standardized for MLIP training, covering a wide range of chemical elements and interactions relevant in organic chemistry. OpenQDC includes tools for normalization and integration, easily accessible via Python. Experiments with well-known architectures like SchNet, TorchMD-Net, and DimeNet reveal challenges for those architectures and constitute a leaderboard to accelerate benchmarking and guide novel algorithms development. Continuously adding datasets to OpenQDC will democratize QM dataset access, foster more collaboration and innovation, enhance MLIP development, and support their adoption in the MD field.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning
Srinivas, Sakhinana Sagar, Runkana, Venkataramana
In the field of chemistry, the objective is to create novel molecules with desired properties, facilitating accurate property predictions for applications such as material design and drug screening. However, existing graph deep learning methods face limitations that curb their expressive power. To address this, we explore the integration of vast molecular domain knowledge from Large Language Models (LLMs) with the complementary strengths of Graph Neural Networks (GNNs) to enhance performance in property prediction tasks. We introduce a Multi-Modal Fusion (MMF) framework that synergistically harnesses the analytical prowess of GNNs and the linguistic generative and predictive abilities of LLMs, thereby improving accuracy and robustness in predicting molecular properties. Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting. Furthermore, our approach effectively addresses distributional shifts, a common challenge in real-world applications, and showcases the efficacy of learning cross-modal representations, surpassing state-of-the-art baselines on benchmark datasets for property prediction tasks.
Transfer Learning for Molecular Property Predictions from Small Data Sets
Kirschbaum, Thorren, Bande, Annika
Machine learning has emerged as a new tool in chemistry to bypass expensive experiments or quantum-chemical calculations, for example, in high-throughput screening applications. However, many machine learning studies rely on small data sets, making it difficult to efficiently implement powerful deep learning architectures such as message passing neural networks. In this study, we benchmark common machine learning models for the prediction of molecular properties on small data sets, for which the best results are obtained with the message passing neural network PaiNN, as well as SOAP molecular descriptors concatenated to a set of simple molecular descriptors tailored to gradient boosting with regression trees. To further improve the predictive capabilities of PaiNN, we present a transfer learning strategy that uses large data sets to pre-train the respective models and allows to obtain more accurate models after fine-tuning on the original data sets. The pre-training labels are obtained from computationally cheap ab initio or semi-empirical models and corrected by simple linear regression on the target data set to obtain labels that are close to those of the original data. This strategy is tested on the Harvard Oxford Photovoltaics data set (HOPV, HOMO-LUMO-gaps), for which excellent results are obtained, and on the Freesolv data set (solvation energies), where this method is unsuccessful due to a complex underlying learning task and the dissimilar methods used to obtain pre-training and fine-tuning labels. Finally, we find that the final training results do not improve monotonically with the size of the pre-training data set, but pre-training with fewer data points can lead to more biased pre-trained models and higher accuracy after fine-tuning.
Has NASA finally found life on Mars? Perseverance collects key samples of Martian soil
NASA's Perseverance Rover has collected a sample of Martian rock to be returned to Earth which could contain signs of life. But don't get too excited yet, as this particular tube won't reach a terrestrial laboratory where it can be studied for another 10 years or so. It has been roaming Mars to look for sampling sites that might contain ancient microbes and organics for almost a year now. In that time, it has completed its first of four search campaigns, which focused on the crater floor and the base of the Neretva Vallis delta. NASA's Perseverance Rover has collected a sample of Martian rock which could contain signs of life. NASA's Perseverance rover (pictured) chooses a sample using its suite of onboard instruments to detect whether organic molecules are present in some rock before coring.
- South America > Chile (0.05)
- North America > United States > California (0.05)
- Government > Space Agency (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Artificial Intelligence in Chemistry – Tajinder Singh
Artificial intelligence is a field of science concerned with building computers and machines that can reason, learn, and act in such a way that would normally require human intelligence or that involves data whose scale exceeds what humans can analyze. AI, based largely on machine-learning algorithms that can mine huge data sets for patterns and correlations, seem best regarded as an assistant to, rather than a replacement for, the human researcher. It can do an awful lot, especially when coupled to robotic systems: not just analyse data but plan and execute experiments, make iterative improvements and even formulate and test specific hypotheses. Little of this is yet routine in the laboratory, but it is becoming ever more so. In some ways, chemistry is ripe for AI colonisation.
- Health & Medicine (0.49)
- Leisure & Entertainment > Games > Chess (0.31)
NASA's Perseverance Rover deposits its first of 10 samples of Martian rock to be returned to Earth
NASA's Perseverance Rover has finally deposited its first sample of Martian rock to be returned to Earth. The car-sized robot began its mission to find ancient biomarkers in the clay on the Red Planet on April 22, which could indicate if alien life ever existed there. It has been roaming around a delta to look for sampling sites that might contain ancient microbes and organics, before drilling down to extract a specimen. Most of those it has collected so far remain in its belly, however this one is the first to be dropped at the base of the delta, and may be retrieved in a future mission. This titanium tube (pictured) contains a core of igneous rock extracted from a region of Mars' Jezero Crater called'South Séítah' on January 31 NASA's Perseverance rover (pictured) chooses a sample using its suite of onboard instruments to detect whether organic molecules are present in some rock before coring. Mars is the fourth planet from the sun, with a'near-dead' dusty, cold, desert world with a very thin atmosphere.
- Government > Space Agency (1.00)
- Government > Regional Government > North America Government > United States Government (0.95)
Mars rover is yet to find 'perfect' rock sample almost two months into its search for past life
NASA's Perseverance rover has been aptly named because -- nearly two months after beginning its search into past life on Mars -- it has still yet to find any viable samples. The car-sized robot began its mission to find ancient biomarkers in the Martian clay on April 22, which could indicate if alien life ever existed on the Red Planet. It has been roaming around an ancient delta to look for sampling sites that might contain ancient microbes and organics. The rover then drills down to extract a specimen that it plans to leave at the base of the delta to be retrieved in future missions. However, NASA has since revealed that, so far, no samples have been successfully collected. The fragile clay materials the rover targets have been known to fracture, crack and crumble during the abrasion and coring process.
Computational modeling guides development of new materials
Metal-organic frameworks, a class of materials with porous molecular structures, have a variety of possible applications, such as capturing harmful gases and catalyzing chemical reactions. Made of metal atoms linked by organic molecules, they can be configured in hundreds of thousands of different ways. To help researchers sift through all of the possible metal-organic framework (MOF) structures and help identify the ones that would be most practical for a particular application, a team of MIT computational chemists has developed a model that can analyze the features of a MOF structure and predict if it will be stable enough to be useful. The researchers hope that these computational predictions will help cut the development time of new MOFs. "This will allow researchers to test the promise of specific materials before they go through the trouble of synthesizing them," says Heather Kulik, an associate professor of chemical engineering at MIT.
- Materials > Chemicals (0.52)
- Government > Regional Government > North America Government > United States Government (0.51)